# Cross-modal understanding
Qwen2.5 Omni 7B GGUF
Other
Qwen2.5-Omni-7B-GGUF is the GGUF format version of the Qwen2.5-Omni-7B model, supporting multimodal inputs including text, audio, and images.
Large Language Model English
Q
ggml-org
319
3
VITA 1.5
VITA-1.5 is a multimodal interaction model designed to achieve GPT-4o level real-time vision and voice interaction capabilities.
V
VITA-MLLM
345
40
CSUMLM
Apache-2.0
CSUMLM is a cutting-edge artificial intelligence system that integrates the advantages of multimodal AI engines and large language models, featuring multimodal processing, complex language understanding, and real-time learning capabilities.
Multimodal Fusion
Transformers Supports Multiple Languages

C
Or4cl3-1
35
1
Veld Base
Apache-2.0
Pre-trained visual encoder-text decoder model supporting Korean and English
Image-to-Text
Transformers Supports Multiple Languages

V
KETI-AIR
40
0
Featured Recommended AI Models